While we visualize data as 2D grids for mathematical convenience, hardware sees only a contiguous 1D stream of bytes. Understanding this "linear reality" is the prerequisite for implementing row-wise reduction patterns—such as finding the maximum value or the sum of exponents.
1. The "Linear Flattening" Principle
Every multi-dimensional tensor is physically stored sequentially. To implement $\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$, we must identify the linear segment representing a row and perform traversals to calculate the maximum and sum.
2. Numerical Stability
Why softmax needs stabilization? High input values cause $e^{x}$ to explode. We stabilize via: $$\text{exp}(x_i - \text{max}(x))$$ This forces the kernel designer to perform a two-pass linear reduction (Max then Sum) before final normalization.
3. Verification via Short Rows
When developing Triton kernels, we use testing only short rows (e.g., width 16) to ensure our linear pointer arithmetic captures every element correctly before scaling to production workloads.